Analysis: Studying the Properties of Topic Models with Different Alpha Values

In a topic model, the value of the hyper-parameter alpha dictates how the topics are distributed across the documents. A higher value for alpha means that a topic will be distributed more widely across the documents, whereas a lower value for alpha means that a topic will be distributed more narrowly across the documents. Wallach et al. (2009) argue that attention to the settings of alpha is important in constructing a robust topic model. Yet, many studies that utalize topic models for various purposes simply set the alpha hyper-parameter to the default value (Carron-Arthur et al. (2016) and Székely and vom Brocke (2017)). In the case of gensim the default value for alpha is 'symmetric.' This means that the value for alpha is uniform for each topic. The formula which gensim uses to calculate the symmetric value for alpha is to divide 1.0 by the number of topics in the model. So if the model has 75 topics, alpha will be set to 0.013.

Here I analyze the properties of topic models with three different alpha values:

  • model_alpha_symmetric: This model takes gensim's default setting for alpha, which here results in a value of 0.013. The number of topics is 75 and the model is based on a noun-only version of the corpus.
  • model_alpha_auto: This model has the alpha value set to 'auto' which means that gensim estimates a value for each topic which results in asymmetric values for alpha. See bellow for values of alpha for this mode. The number of topics is 75 and the model is based on a noun-only version of the corpus.
  • model_alpha_05: This model has an alpha value of 0.5 for each topic (symmetric). This is a high value for alpha intended to show how a high value affects the model. The number of topics is 75 and the model is based on a noun-only version of the corpus.

Set Up: Import Packages and Load Topic Models

In [2]:
from gensim import corpora, models, similarities
import pyLDAvis.gensim
import spacy
import json

path = '../noun_corpus/'

# load metadata for later use
with open('../data/doc2metadata.json', encoding='utf8', mode='r') as f:
    doc2metadata = json.load(f)
    
# load dictionary and corpus for the noun models
dictionary = corpora.Dictionary.load(path + 'noun_corpus.dict')
corpus = corpora.MmCorpus(path + 'noun_corpus.mm')

# load alpha = symmetric model
model_alpha_symmetric = models.ldamodel.LdaModel.load(path + 'noun_75.model')

# load alpha = auto model
model_alpha_auto = models.ldamodel.LdaModel.load(path + 'alphas/noun_auto.model')

# load alpha = 0.5 model
model_alpha_05 = models.ldamodel.LdaModel.load(path + 'alphas/noun_05.model')

Alpha Values

In [19]:
model_alpha_symmetric.alpha
Out[19]:
array([0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333,
       0.01333333, 0.01333333, 0.01333333, 0.01333333, 0.01333333],
      dtype=float32)
In [20]:
model_alpha_auto.alpha
Out[20]:
array([0.10270125, 0.0592977 , 0.07087024, 0.06412899, 0.05616595,
       0.04951361, 0.03805129, 0.22600546, 0.12083639, 0.05507494,
       0.06248622, 0.06897874, 0.04855374, 0.00229159, 0.05208761,
       0.03678485, 0.0615502 , 0.07355741, 0.04548518, 0.06304422,
       0.07587688, 0.10141291, 0.0303052 , 0.06570146, 0.07862557,
       0.03714601, 0.05216953, 0.06001518, 0.03180525, 0.05900658,
       0.0474006 , 0.0520789 , 0.05153614, 0.0444841 , 0.08849999,
       0.06562751, 0.05767829, 0.5677949 , 0.0429585 , 0.05217329,
       0.05128561, 0.05068998, 0.03617514, 0.04957202, 0.07394192,
       0.04284829, 0.05258643, 0.05349901, 0.0978125 , 0.0437447 ,
       0.0301297 , 0.05985987, 0.04824056, 0.05434861, 0.05524151,
       0.04822579, 0.05105721, 0.05232021, 0.04474712, 0.06868275,
       0.04497385, 0.13791354, 0.05414683, 0.02755224, 0.05568515,
       0.06769605, 0.04380047, 0.18338257, 0.02185606, 0.05321367,
       0.0817384 , 0.039121  , 0.11358655, 0.11281471, 0.055462  ],
      dtype=float32)
In [21]:
model_alpha_05.alpha
Out[21]:
array([0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
       0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], dtype=float32)

Topic Coherence Test

Topic Coherence: model_alpha_symmetric

In [3]:
model_alpha_symmetric_viz = pyLDAvis.gensim.prepare(model_alpha_symmetric, corpus, dictionary)
pyLDAvis.display(model_alpha_symmetric_viz)
Out[3]:

model_alpha_symmetric produced 13 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 57 topics which are coherent. Therefore its topics are:

  • 17.3% junk topics
  • 6.6% mixed topics
  • 76% coherent topics

A few examples of junk topics:

  • topic 3: essay, bible, john, commentary, old, theology, james, fortress, david, paul
  • topic 9: faith, hebrews, life, sin, thought, love, sense, people, heart, judgment
  • topic 74: plate, script, hatch, harrison, index, haran, equivalent, print, earthquake, tribution

An example of mixed topics:

  • topic 44: esther, hand, foot, eye, king, garment, head, house, moore, gold (This topic could be thought of as a mix of "body" and the story of Esther).

A few examples of coherent topics:

  • topic 11 (narrative criticism): story, narrative, character, reader, narrator, account, event, motif, element, pattern
  • topic 34 (dead sea scrolls): qumran, scroll, dead, sea, community, scrolls, document, cave, sect, fragment
  • topic 38 (family): son, father, child, family, mother, bother, marriage, wife, daughter, birth

Topic Coherence: model_alpha_auto

In [4]:
model_alpha_auto_viz = pyLDAvis.gensim.prepare(model_alpha_auto, corpus, dictionary)
pyLDAvis.display(model_alpha_auto_viz)
Out[4]:

model_alpha_auto produced 13 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 57 topics which are coherent. Therefore its topics are:

  • 17.3% junk topics
  • 6.6% mixed topics
  • 76% coherent topics

A few examples of junk topics:

  • topic 3: essay, bible, john, commentary, old, james, fortress, theology, david, paul
  • topic 9: faith, hebrews, life, sin, thought, sense, love, people, heart, mind
  • topic 74: plate, hatch, harrison, haran, equivalent, print, script, index, earthquake, harrelson

An example of mixed topics:

  • topic 44: esther, hand, foot, eye, king, garment, head, house, moore, gold (This topic could be thought of as a mix of "body" and the story of Esther).

A few examples of coherent topics:

  • topic 11 (narrative criticism): story, narrative, character, reader, narrator, account, event, motif, element, pattern
  • topic 34 (dead sea scrolls): qumran, scroll, dead, sea, community, scrolls, document, cave, sect, fragment
  • topic 37 (family): son, father, child, family, mother, bother, marriage, wife, daughter, birth

Topic Coherence: model_alpha_05

In [6]:
model_alpha_05_viz = pyLDAvis.gensim.prepare(model_alpha_05, corpus, dictionary)
pyLDAvis.display(model_alpha_05_viz)
Out[6]:

model_alpha_05 produced 12 topics which lack semantic or contextual coherence, 5 topics of mixed coherence, and 58 topics which are coherent. Therefore its topics are:

  • 16% junk topics
  • 6.5% mixed topics
  • 77.3% coherent topics

A few examples of junk topics:

  • topic 3: essay, bible, commentary, fortress, scholars, james, david, old, introduction, theology
  • topic 70: esther, kin, lot, moore, garment, seal, gold, instrument, balaam, judith

An example of mixed topics:

  • topic 29 (family):son, father, child, family, brother, jacob, genesis, abraham, mother, joseph (This topic could be thought of as a mix of "family" and "Patriarchs."

A few examples of coherent topics:

  • topic 18 (narrative criticism): story, narrative, account, motif, event, episode, tale, theme, scene, joseph
  • topic 42 (dead sea scrolls): qumran, scroll, dead, sea, community, scrolls, fragment, cave, sect, document
  • topic 24 (justification): faith, christ, promise, salvation, covenant, law, righteousness, gentile, abraham, justification

Topic coherence: Brief Discussion

Each of these models produced a similar number of coherent topics: model_alpha_symmetric has 57 coherent topics, noun_alpha_auto also has 57 coherent topics, and model_alpha_05has 58. A close examination of the topics in these models reveals that they are very similar to one another (especially between model_alpha_symmetric and model_alpha_auto) in terms of the words in topics, although they differ in the order of prominence of words in the topic and the prominence of topics in the corpus (hence they are numbered differently in the visualizations above. Interestingly, model_alpha_05 identified an important topic in New Testament scholarship: topic 24 (justification).

Clustering Test

In [3]:
def cluster_test(corpus, model):
    docs_with_1_topic = 0
    docs_with_multiple_topics = 0
    docs_with_no_topics = 0
    total_docs = 0
    for doc in corpus:
        topics = model.get_document_topics(doc, minimum_probability=0.20)
        total_docs += 1
        if len(topics) == 1:
            docs_with_1_topic += 1
        elif len(topics) > 1:
            docs_with_multiple_topics += 1
        else:
            docs_with_no_topics += 1
    print('Corpus assigned to a single topic:', (docs_with_1_topic / total_docs) * 100, '%')
    print('Corpus assigned to multiple topics:', (docs_with_multiple_topics / total_docs) * 100, '%')
    print('corpus assigned to no topics:', (docs_with_no_topics / total_docs) * 100, '%')

Clustering: model_alpha_symmetric

In [24]:
cluster_test(corpus, model_alpha_symmetric)
Corpus assigned to a single topic: 55.252460419341034 %
Corpus assigned to multiple topics: 30.166880616174584 %
corpus assigned to no topics: 14.58065896448438 %

Clustering: model_alpha_auto

In [25]:
cluster_test(corpus, model_alpha_auto)
Corpus assigned to a single topic: 56.84638425331622 %
Corpus assigned to multiple topics: 26.79717586649551 %
corpus assigned to no topics: 16.356439880188276 %

Clustering: model_alpha_05

In [4]:
cluster_test(corpus, model_alpha_05)
Corpus assigned to a single topic: 45.860077021822846 %
Corpus assigned to multiple topics: 5.2952503209242625 %
corpus assigned to no topics: 48.84467265725289 %

Clustering: Brief Discussion

The results of the cluster test for model_alpha_symmetric and model_alpha_auto are fairly close to one another. model_alpha_symmetric left 14.5% of the documents in the corpus unassigned to a topic and model_alpha_auto left 16.3% of the documents in the corpus unassigned to a topic. model_alpha_05 did not perform nearly as well and left 48.8% of the documents in the corpus unassigned to a topic.

Information Retrieval Test

In [5]:
# build indicies for similarity queries
index_symmetric = similarities.MatrixSimilarity(model_alpha_symmetric[corpus]) 
index_auto = similarities.MatrixSimilarity(model_alpha_auto[corpus])
index_05 = similarities.MatrixSimilarity(model_alpha_05[corpus])

# define retrieval test
def retrieval_test(new_doc, lda, index):
    new_bow = dictionary.doc2bow(new_doc)  # change new document to bag of words representation
    new_vec = lda[new_bow]  # change new bag of words to a vector
    index.num_best = 10  # set index to generate 10 best results
    matches = (index[new_vec])
    scores = []
    for match in matches:
        score = (match[1])
        scores.append(score)
        score = str(score)
        key = 'doc_' + str(match[0])
        article_dict = doc2metadata[key]
        author = article_dict['author']
        title = article_dict['title']
        year = article_dict['pub_year']
        print(key + ': ' + author.title() + ' (' + year + '). ' + title.title() + '\n\tsimilarity score -> ' + score + '\n')
    average_score = sum(scores) / len(scores)
    print('*********************************')
    print("Average similarity score ->", average_score)
    
# set up nlp for new docs
nlp = spacy.load('en')
stop_words = spacy.en.STOPWORDS

def get_noun_lemmas(text):
    doc = nlp(text)
    tokens = [token for token in doc]
    noun_tokens = [token for token in tokens if token.tag_ == 'NN' or token.tag_ == 'NNP' or token.tag_ == 'NNS']
    noun_lemmas = [noun_token.lemma_ for noun_token in noun_tokens if noun_token.is_alpha]
    noun_lemmas = [noun_lemma for noun_lemma in noun_lemmas if noun_lemma not in stop_words]
    return noun_lemmas

# load and process Greene, N. E. (2017)
with open('../abstracts/greene.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    greene = get_noun_lemmas(text)
    
#load and process Hollenback, G. M. (2017)
with open('../abstracts/hollenback.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    hollenback = get_noun_lemmas(text)

# load and process Dinkler, M. B. (2017)
with open('../abstracts/dinkler.txt', encoding='utf8', mode='r') as f:
    text = f.read()
    dinkler = get_noun_lemmas(text)

Finding Articles Similar to Greene, N. E. (2017). Creation, destruction, and a Psalmist's plea: rethinking the poetic structure of Psalm 74.

Infomration Retrieval: model_alpha_symmetric

In [9]:
retrieval_test(greene, model_alpha_symmetric, index_symmetric)
doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8950629830360413

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.8928478956222534

doc_804: Peters, John P. (1921). Another Folk Song
	similarity score -> 0.8910002708435059

doc_757: Peters, John P. (1916). Ritual In The Psalms
	similarity score -> 0.8899545669555664

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.8780089616775513

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8754132986068726

doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.871891438961029

doc_9314: Peters, John P. (1910). Notes On Some Ritual Uses Of The Psalms
	similarity score -> 0.8673380017280579

doc_2970: Liebreich, Leon J. (1955). The Songs Of Ascents And The Priestly Blessing
	similarity score -> 0.8623627424240112

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.8536018133163452

*********************************
Average similarity score -> 0.8777481973171234

Infomration Retrieval: model_alpha_auto

In [10]:
retrieval_test(greene, model_alpha_auto, index_auto)
doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.9080101251602173

doc_804: Peters, John P. (1921). Another Folk Song
	similarity score -> 0.9010521769523621

doc_757: Peters, John P. (1916). Ritual In The Psalms
	similarity score -> 0.8904219269752502

doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8873319029808044

doc_8503: Gillingham, S. (1999). Review Of The Message Of The Psalter: An Eschatological Programme In The Book Of Psalms
	similarity score -> 0.886253833770752

doc_7877: Allen, Leslie C. (1989). Review Of The Identity Of The Individual In The Psalms
	similarity score -> 0.8856232166290283

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.8855308890342712

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.881596565246582

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8805809617042542

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.8636133074760437

*********************************
Average similarity score -> 0.8870014905929565

Information Retrieval: model_alpha_05

In [11]:
retrieval_test(greene, model_alpha_05, index_05)
doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
	similarity score -> 0.9100346565246582

doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
	similarity score -> 0.891493558883667

doc_8503: Gillingham, S. (1999). Review Of The Message Of The Psalter: An Eschatological Programme In The Book Of Psalms
	similarity score -> 0.8908661007881165

doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
	similarity score -> 0.8876280188560486

doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
	similarity score -> 0.8742737174034119

doc_7648: Mccann,, J. Clinton (1990). Review Of Psalms: Part I With An Introduction To Cultic Poetry
	similarity score -> 0.872532308101654

doc_7877: Allen, Leslie C. (1989). Review Of The Identity Of The Individual In The Psalms
	similarity score -> 0.8684517741203308

doc_1411: Berry, George R. (1914). The Titles Of The Psalms
	similarity score -> 0.8676559329032898

doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah
	similarity score -> 0.857624351978302

doc_8075: Landes, George M. (1992). Review Of Jonah: A New Translation With Introduction, Commentary, And Interpretation
	similarity score -> 0.8552865982055664

*********************************
Average similarity score -> 0.8775847017765045

Brief Discussion: Finding articles similar to Greene, N. E. (2017). Creation, destruction, and a Psalmist's plea: rethinking the poetic structure of Psalm 74.

The average similarity score for the top ten results for the first information retrieval task are as follows:

  • model_alpha_symmetric: average similarity score -> 87.7%
  • model_alpha_auto: average similarity score -> 86.3%
  • model_alpha_05: average similarity score -> 87.7%

These models achieved similar average similarity scores in the first infomration retrieval task and each model returned documents about psalms in its results. Six documents from the corpus were matches with the Greene article in all three models (although there was not consitency in how high of a match each was ranked):

  • doc_9217: Briggs, Charles A. (1899). An Inductive Study Of Selah
  • doc_1411: Berry, George R. (1914). The Titles Of The Psalms
  • doc_2855: Jefferson, Helen Genevieve (1952). Psalm 93
  • doc_123: Armstrong, Ryan M. (2012). Psalms Dwelling Together In Unity: The Placement Of Psalms 133 And 134 In Two Different Psalms Collections
  • doc_8205: Waltke, Bruce K. (1991). Superscripts, Postcripts, Or Both
  • doc_5418: Buss, Martin J. (1963). The Psalms Of Asaph And Korah

Finding Articles Similar to Hollenback, G. M. (2017). Who is doing what to whom revisited: Another look at Leviticus 18:22 and 20:13.

Infomration Retrieval: model_alpha_symmetric

In [12]:
retrieval_test(hollenback, model_alpha_symmetric, index_symmetric)
doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.7967202663421631

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.7828606367111206

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7453253865242004

doc_284: Lemos, T. M. (2006). Shame And Mutilation Of Enemies In The Hebrew Bible
	similarity score -> 0.7283123731613159

doc_1851: Kraemer, Ross S. (1985). Review Of In Memory Of Her: A Feminist Theological Reconstruction Of Christian Origins
	similarity score -> 0.7240121364593506

doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity
	similarity score -> 0.7107797265052795

doc_143: Townsley, Jeramy (2011). Paul, The Goddess Religions, And Queer Sects: Romans 1:23—28
	similarity score -> 0.7066832780838013

doc_1974: Trible, Phyllis (1987). Review Of The Israelite Woman: Social Role And Literary Type In Biblical Narrative
	similarity score -> 0.7020665407180786

doc_6994: Corley, Kathleen E. (1996). Review Of The Double Message: Patterns Of Gender In Luke-Acts
	similarity score -> 0.6915348768234253

doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom?
	similarity score -> 0.6825031638145447

*********************************
Average similarity score -> 0.727079838514328

Information Retrieval: model_alpha_auto

In [13]:
retrieval_test(hollenback, model_alpha_auto, index_auto)
doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.7926235198974609

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.7689684629440308

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7662057280540466

doc_1851: Kraemer, Ross S. (1985). Review Of In Memory Of Her: A Feminist Theological Reconstruction Of Christian Origins
	similarity score -> 0.750426173210144

doc_1974: Trible, Phyllis (1987). Review Of The Israelite Woman: Social Role And Literary Type In Biblical Narrative
	similarity score -> 0.7313250303268433

doc_284: Lemos, T. M. (2006). Shame And Mutilation Of Enemies In The Hebrew Bible
	similarity score -> 0.7302554845809937

doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity
	similarity score -> 0.7274610996246338

doc_6994: Corley, Kathleen E. (1996). Review Of The Double Message: Patterns Of Gender In Luke-Acts
	similarity score -> 0.7104109525680542

doc_2149: Running, Leona Glidden (1983). Review Of Il Femminismo Della Bibbia
	similarity score -> 0.7016556859016418

doc_143: Townsley, Jeramy (2011). Paul, The Goddess Religions, And Queer Sects: Romans 1:23—28
	similarity score -> 0.6972150802612305

*********************************
Average similarity score -> 0.7376547217369079

Information Retrieval: model_alpha_05

In [14]:
retrieval_test(hollenback, model_alpha_05, index_05)
doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity
	similarity score -> 0.8043299913406372

doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
	similarity score -> 0.8030654788017273

doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
	similarity score -> 0.798945426940918

doc_225: Miller, James E. (2009). A Critical Response To Karin Adams'S Reinterpretation Of Hosea 4:13-14
	similarity score -> 0.7939445972442627

doc_6466: Friedman, Mordechai A. (1980). Israel'S Response In Hosea 2:17B: "You Are My Husband"
	similarity score -> 0.7831361293792725

doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
	similarity score -> 0.7759504318237305

doc_7805: Bird, Phyllis A. (1993). Review Of Frauen Im Alten Israel: Eine Begriffsgeschichtliche Und Sozialrechtliche Studie Zur Stellung Der Frau Im Alten Testament
	similarity score -> 0.7676080465316772

doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom?
	similarity score -> 0.763600766658783

doc_9258: Kalmanofsky, Amy (2011). The Dangerous Sisters Of Jeremiah And Ezekiel
	similarity score -> 0.7518566250801086

doc_1481: Bassler, Jouette M. (1984). The Widows' Tale: A Fresh Look At 1 Tim 5:3-16
	similarity score -> 0.7388158440589905

*********************************
Average similarity score -> 0.7781253337860108

Brief Discussion: Finding articles similar to Hollenback, G. M. (2017). Who is doing what to whom revisited: Another Look at Leviticus 18:22 and 20:13.

The average similarity score for the top ten results for the second information retrieval task are as follows:

  • model_alpha_symmetric: average similarity score -> 68.2%
  • model_alpha_auto: average similarity score -> 73.7%
  • model_alpha_05: average similarity score -> 77.8%

Each model returned documents dealing with gender and sexuality which is appropriate given the nature of the query article. Four documents from the corpus were matches with the Hollenback article in all three models:

  • doc_8995: Martin, Troy W. (2004). Paul'S Argument From Nature For The Veil In 1 Corinthians 11:13-15: A Testicle Instead Of A Head Covering
  • doc_463: Cosgrove, Charles H. (2005). A Woman'S Unbound Hair In The Greco-Roman World, With Special Reference To The Story Of The "Sinful Woman" In Luke 7:36-50
  • doc_8719: Burrus, Virginia (1999). Review Of Early Christian Women And Pagan Opinion: The Power Of The Hysterical Woman
  • doc_316: Nasrallah, Laura (2006). Review Of A Woman'S Place: House Churches In Earliest Christianity

Interestingly, all three models ranked doc_463 as the second most likely match. It is also worth noting that doc_8757: Walsh, Jerome T. (2001). Leviticus 18:22 And 20:13: Who Is Doing What To Whom? was returned as a match for model_alpha_symmetric and for model_alpha_05. This is the article to which the query article is a response.

Finding articles similar to Dinkler, M. B. (2017). Building Character on the Road to Emmaus: Lukan Characterization in Contemporary Literary Perspective.

Information Retrieval: model_alpha_symmetric

In [15]:
retrieval_test(dinkler, model_alpha_symmetric, index_symmetric)
doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.8752881288528442

doc_7866: Lincoln, Andrew T. (1989). The Promise And The Failure: Mark 16:7, 8
	similarity score -> 0.8574842214584351

doc_1952: Praeder, Susan Marie (1984). Review Of Mark As Story: An Introduction To The Narrative Of A Gospel
	similarity score -> 0.8537847995758057

doc_7796: Malbon, Elizabeth Struthers (1993). Echoes And Foreshadowings In Mark 4-8 Reading And Rereading
	similarity score -> 0.8523545265197754

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.852075457572937

doc_264: Ahearne-Kroll, Stephen P. (2010). Audience Inclusion And Exclusion As Rhetorical Technique In The Gospel Of Mark
	similarity score -> 0.839181661605835

doc_7865: Malbon, Elizabeth Struthers (1989). The Jewish Leaders In The Gospel Of Mark: A Literary Study Of Marcan Characterization
	similarity score -> 0.8106918931007385

doc_6706: Boomershine, Thomas E. (1981). Mark 16:8 And The Apostolic Commission
	similarity score -> 0.7990190982818604

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.7910127639770508

doc_3857: Robbins, Vernon K. (1973). The Healing Of Blind Bartimaeus (10:46-52) In The Marcan Theology
	similarity score -> 0.7885808348655701

*********************************
Average similarity score -> 0.8319473385810852

Information Retrieval: model_alpha_auto

In [16]:
retrieval_test(dinkler, model_alpha_auto, index_auto)
doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.9522599577903748

doc_1952: Praeder, Susan Marie (1984). Review Of Mark As Story: An Introduction To The Narrative Of A Gospel
	similarity score -> 0.8992863893508911

doc_1951: Kee, Howard Clark (1984). Review Of Jesus Walking On The Sea: Meaning And Gospel Functions Of Matt 14:22-23, Mark 6:45-52 And John 6:15B-21
	similarity score -> 0.870707631111145

doc_6960: Collins, Adela Yarbro (1994). Review Of Teaching With Authority: Miracles And Christology In The Gospel Of Mark
	similarity score -> 0.8650040626525879

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.8637354373931885

doc_7110: Stegner, William Richard (1995). Review Of Israel'S Scripture Traditions And The Synoptic Gospels: Story Shaping Story
	similarity score -> 0.8619969487190247

doc_8083: Anderson, Janice Capel (1992). Review Of Matthew'S Missionary Discourse: A Literary Critical Analysis
	similarity score -> 0.8462436199188232

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.8434828519821167

doc_7109: Moore, Stephen D. (1995). Review Of Deconstructing The New Testament
	similarity score -> 0.842341959476471

doc_7796: Malbon, Elizabeth Struthers (1993). Echoes And Foreshadowings In Mark 4-8 Reading And Rereading
	similarity score -> 0.8411325216293335

*********************************
Average similarity score -> 0.8686191380023957

Information Retrieval: model_alpha_05

In [17]:
retrieval_test(dinkler, model_alpha_05, index_05)
doc_7506: Moore, Stephen D. (1996). Review Of Reading Mark From The Outside: Eco And Iser Leave Their Marks
	similarity score -> 0.8930608034133911

doc_7978: Collins, Adela Yarbro (1993). Review Of Irony In Mark'S Gospel: Text And Subtext
	similarity score -> 0.8816823363304138

doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
	similarity score -> 0.8777000308036804

doc_7819: Collins, Adela Yarbro (1993). Review Of  "Eine Neue Lehre In Vollmacht": Die Streit- Und Schulgespräche Des Markus-Evangeliums 
	similarity score -> 0.8729690313339233

doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
	similarity score -> 0.8682205677032471

doc_9260: Iverson, Kelly R. (2011). A Centurion'S "Confession": A Performance-Critical Analysis Of Mark 15:39
	similarity score -> 0.8643099665641785

doc_7035: Green, Joel B. (1998). Review Of The Paradox Of Salvation: Luke'S Theology Of The Cross
	similarity score -> 0.8499305844306946

doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel
	similarity score -> 0.83594810962677

doc_5911: Nardoni, Enrique (1980). Review Of  La Transfiguración De Jesús Y El Diálogo Sobre Elías Según El Evangelio De San Marcos 
	similarity score -> 0.8314520120620728

doc_8446: Bautch, Richard J. (2004). Review Of Pontius Pilate: Portraits Of A Roman Governor
	similarity score -> 0.8301518559455872

*********************************
Average similarity score -> 0.8605425298213959

Brief Discussion: Finding Articles Similar to Dinkler, M. B. (2017). Building character on the road to Emmaus: Lukan characterization in contemporary literary perspective.

The average similarity score for the top ten results for the third information retrieval task are as follows:

  • model_alpha_symmetric: average similarity score -> 83.1%
  • model_alpha_auto: average similarity score -> 84.0%
  • model_alpha_05: average similarity score -> 86.0%

Each topic model retrieved documents dealing with the gospels which on a general level are appropriate for the query article. The average similarity score for each model are close to one another for this retrieval task. Three documents from the corpus were returned by all three models:

  • doc_8158: Tyson, Joseph B. (1988). Review Of The Lukan Voice: Confusion And Irony In The Gospel Of Luke
  • doc_8712: Brodie, Thomas L. (1999). Review Of The Discipleship Paradigm: Readers And Anonymous Characters In The Fourth Gospel
  • doc_312: Sylva, Dennis (2006). Review Of Dialogue And Drama: Elements Of Greek Tragedy In The Fourth Gospel

doc_8158 was ranked as the top match by model_alpha_symmetric and model_alpha_auto.